IIIT-H: A Corpus-Driven Co-occurrence Based Probabilistic Model for Noun Compound Paraphrasing

نویسندگان

  • Nitesh Surtani
  • Arpita Batra
  • Urmi Ghosh
  • Soma Paul
چکیده

This paper presents a system for automatically generating a set of plausible paraphrases for a given noun compound and rank them in decreasing order of their usage represented by the confidence value provided by the human annotators. Our system implements a corpusdriven probabilistic co-occurrence based model for predicting the paraphrases, that uses a seed list of paraphrases extracted from corpus to predict other paraphrases based on their co-occurrences. The corpus study reveals that the prepositional paraphrases for the noun compounds are quite frequent and well covered but the verb paraphrases, on the other hand, are scarce, revealing the unsuitability of the model for standalone corpus-driven approach. Therefore, to predict other paraphrases, we adopt a two-fold approach: (i) Prediction based on Verb-Verb cooccurrences, in case the seed paraphrases are greater than threshold; and (ii) Prediction based on Semantic Relation of NC, otherwise. The system achieves a comparabale score of 0.23 for the isomorphic system while maintaining a score of 0.26 for the non-isomorphic system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Yet Another Compound Noun Analysis Using WordCo - occurrence

Compound noun has the similar structure to a simple sentence, for example, predicate-argument structure in many cases, as well as it represents compound meaning with several nouns combined. In this paper, we suggest a new method to analyze Korean noun phrases with posessive and compound nouns by using the statistical model based on verb-noun co-occurrence relations and word collocations between...

متن کامل

Influence of accurate compound noun splitting on bilingual vocabulary extraction

The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected ...

متن کامل

A System for Compound Noun Multiword Expression Extraction for Hindi

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...

متن کامل

Co-occurrence Contexts for Noun Compound Interpretation

Contextual information extracted from corpora is frequently used to model semantic similarity. We discuss distinct classes of context types and compare their effectiveness for compound noun interpretation. Contexts corresponding to word-word similarity perform better than contexts corresponding to relation similarity, even when relational co-occurrences are extracted from a much larger corpus. ...

متن کامل

Expanding Parallel Resources for Medium-Density Languages for Free

We discuss a previously proposed method for augmenting parallel corpora of limited size for the purposes of machine translation through monolingual paraphrasing of the source language. We develop a three-stage shallow paraphrasing procedure to be applied to the Swedish-Bulgarian language pair for which limited parallel resources exist. The source language exhibits specifics not typical of high-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013